An Authorship Attribution for Serbian
نویسندگان
چکیده
An authorship attribution is a problem of identifying the author of an anonymous or disputed text if there is a closed set of candidate authors. Due to the richness of natural languages and numerous ways of expressing individuality in a writing process, this task employs all the sources of language knowledge: lexis, syntax, semantics, orthography, etc. The impressive results of n-gram based algorithms have been presented in many papers for many languages so far. The goal of our research was to test if this group of algorithms works equally well on Serbian and if it is a case, to calculate the optimal values for the parameters appearing in the algorithms. Also, we wanted to test if a syllable based word decomposition, which represents a more human like word decomposition in comparison to n-grams, can be useful in an authorship attribution. Our results confirm good performance of an n-gram based approach (accuracy up to 96%) and show the potential usefulness of a syllable based approach (accuracy from 81% to 89%).
منابع مشابه
N-gram Based Text Classification According To Authorship
Authorship attribution studies consider author's identification of an anonymous text. This is a long history problem with a great number of various approaches. Those ones based on n-grams single out by their performances and good results. A n-gram approach is language independent but the selection of a number n is actually not. The focus of this paper is determination of a set of optimal values...
متن کاملA Survey on Authorship Analysis
The paper discusses about the problem of Authorship analysis, different types of authorship analysis’s such as authorship attribution, authorship identification, authorship profiling, plagiarism detection. It also addresses the issues in Indian language text. Keywords— Authorship attribution, authorship profiling, plagiarism detection, text classification.
متن کاملMore than Word Frequencies: Authorship Attribution via Natural Frequency Zoned Word Distribution Analysis
With such increasing popularity and availability of digital text data, authorships of digital texts can not be taken for granted due to the ease of copying and parsing. This paper presents a new text style analysis called natural frequency zoned word distribution analysis (NFZ-WDA), and then a basic authorship attribution scheme and an open authorship attribution scheme for digital texts based ...
متن کاملApplication of Information Retrieval Techniques for Source Code Authorship Attribution
Authorship attribution assigns works of contentious authorship to their rightful owners solving cases of theft, plagiarism and authorship disputes in academia and industry. In this paper we investigate the application of information retrieval techniques to attribution of authorship of C source code. In particular, we explore novel methods for converting C code into documents suitable for retrie...
متن کاملAuthorship Attribution in Bengali Language
We describe Authorship Attribution of Bengali literary text. Our contributions include a new corpus of 3,000 passages written by three Bengali authors, an end-toend system for authorship classification based on character n-grams, feature selection for authorship attribution, feature ranking and analysis, and learning curve to assess the relationship between amount of training data and test accu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012